The Hadoop Distributed File System: Balancing Portabilty
نویسنده
چکیده
Hadoop is a software framework that supports data intensive distributed application. Hadoop creates clusters of machine and coordinates the work among them. It include two major component, HDFS (Hadoop Distributed File System) and MapReduce. HDFS is designed to store large amount of data reliably and provide high availability of data to user application running at client. It creates multiple data blocks and store each of the block redundantly across the pool of servers to enable reliable, extreme rapid computation. MapReduce is software framework for the analyzing and transforming a very large data set in to desired output. This paper focus on how the replicas are managed in HDFS for providing high availability of data under extreme computational requirement. Later this paper focus on possible failure that will affect the Hadoop cluster and which are failover mechanism can be deployed for protecting the cluster.
منابع مشابه
An Improved Hadoop Data Load Balancing Algorithm
Data load balancing is one of the key problems of big data technology. As a big data application, Hadoop has had many successful applications. HDFS is Hadoop Distributed File System and has the load balancing procedure which can balance the storage load on each machine. However, this method cannot balance the overload rack preferentially, and so it is likely to cause the breakdown of overload m...
متن کاملStabilizing Load across Cloud for Distributed File Access
Distributed file systems in clouds such as GFS(Google File System) and HDFS(Hadoop Distributed file systems) rely on central servers to manage the metadata and the load balancing. DFS are keys building block for cloud computing application based on the reduce programming. In this file system node at the same time provide computing and storage functions, a file dived into a number of parts alloc...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملParallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment
Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...
متن کاملSecure and Privacy-Preserving Distributed File Systems on Load Rebalancing in Cloud Computing
Distributed file systems in cloud computing because Google File System GFS and Hadoop Distributed file systems HDFS scheduled central servers in the direction of manage and load balancing in the metadata. Enabling technology for largescale computation for big data Originated by Google. Open source implementation by Yahoo and Facebook. Distributed file systems DFS are keys structure block for cl...
متن کامل